In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions. Due to the scarcity of explicit user feedback, modern recommender systems typically optimize for the same fixed combination of implicit feedback signals across all users. However, this approach disregards a growing body of work highlighting that (i) implicit signals can be used by users in diverse ways, signaling anything from satisfaction to active dislike, and (ii) different users communicate preferences in different ways. We propose applying the recent Interaction Grounded Learning (IGL) paradigm to address the challenge of learning representations of diverse user communication modalities. Rather than taking a fixed, human-designed reward function, IGL is able to learn personalized reward functions for different users and then optimize directly for the latent user satisfaction. We demonstrate the success of IGL with experiments using simulations as well as with real-world production traces.
translated by 谷歌翻译
强大的增强学习(RL)的目的是学习一项与模型参数不确定性的强大策略。由于模拟器建模错误,随着时间的推移,现实世界系统动力学的变化以及对抗性干扰,参数不确定性通常发生在许多现实世界中的RL应用中。强大的RL通常被称为最大问题问题,其目的是学习最大化价值与不确定性集合中最坏可能的模型的策略。在这项工作中,我们提出了一种称为鲁棒拟合Q-材料(RFQI)的强大RL算法,该算法仅使用离线数据集来学习最佳稳健策略。使用离线数据的强大RL比其非持续性对应物更具挑战性,因为在强大的Bellman运营商中所有模型的最小化。这在离线数据收集,对模型的优化以及公正的估计中构成了挑战。在这项工作中,我们提出了一种系统的方法来克服这些挑战,从而导致了我们的RFQI算法。我们证明,RFQI在标准假设下学习了一项近乎最佳的强大政策,并证明了其在标准基准问题上的出色表现。
translated by 谷歌翻译
我们考虑了上下文匪徒的违规评估(OPE)问题,其中目标是使用日志记录策略收集的数据估计目标策略的值。 ope的最流行方法是通过组合直接方法(DM)估计和涉及逆倾向得分(IP)的校正项而获得的双重稳健(DR)估计器的变型。现有算法主要关注降低大型IPS引起的博士估算器方差的策略。我们提出了一种称为双重强大的新方法,具有信息借用和基于上下文的交换(DR-IC)估计,专注于减少偏差和方差。 DR-IC估计器用参数奖励模型替换标准DM估计器,该参数奖励模型通过依赖于IPS的相关结构从“更近的”上下文中借用信息。 DR-IC估计器还基于特定于上下文的切换规则在该修改的DM估计器和修改的DR估计器之间自适应地插值。我们对DR-IC估算员的表现提供了可证明的保证。我们还展示了DR-IC估计的卓越性能与艺术最先进的OPE算法相比,在许多基准问题上的算法相比。
translated by 谷歌翻译
鲁棒马尔可夫决策过程(RMDP)框架侧重于设计对参数不确定因素而稳健的控制策略,这是由于模拟器模型和真实世界的不匹配。 RMDP问题通常被制定为MAX-MIN问题,其中目标是找到最大化最坏可能模型的值函数的策略,该策略在于围绕标称模型设置的不确定性。标准强大的动态编程方法需要了解标称模型来计算最佳的强大策略。在这项工作中,我们提出了一种基于模型的强化学习(RL)算法,用于学习$ \ epsilon $ - 当标称模型未知时的高新策略。我们考虑了三种不同形式的不确定集,其特征在于总变化距离,Chi-Square发散和kL发散。对于这些不确定性集中的每一个,我们提供了所提出算法的样本复杂性的精确表征。除了样本复杂性结果之外,我们还提供了一个正式的分析论证,就使用强大的政策的益处。最后,我们展示了我们对两个基准问题的算法的性能。
translated by 谷歌翻译
The ability to distinguish between different movie scenes is critical for understanding the storyline of a movie. However, accurately detecting movie scenes is often challenging as it requires the ability to reason over very long movie segments. This is in contrast to most existing video recognition models, which are typically designed for short-range video analysis. This work proposes a State-Space Transformer model that can efficiently capture dependencies in long movie videos for accurate movie scene detection. Our model, dubbed TranS4mer, is built using a novel S4A building block, which combines the strengths of structured state-space sequence (S4) and self-attention (A) layers. Given a sequence of frames divided into movie shots (uninterrupted periods where the camera position does not change), the S4A block first applies self-attention to capture short-range intra-shot dependencies. Afterward, the state-space operation in the S4A block is used to aggregate long-range inter-shot cues. The final TranS4mer model, which can be trained end-to-end, is obtained by stacking the S4A blocks one after the other multiple times. Our proposed TranS4mer outperforms all prior methods in three movie scene detection datasets, including MovieNet, BBC, and OVSD, while also being $2\times$ faster and requiring $3\times$ less GPU memory than standard Transformer models. We will release our code and models.
translated by 谷歌翻译
A biological system is a complex network of heterogeneous molecular entities and their interactions contributing to various biological characteristics of the system. However, current biological networks are noisy, sparse, and incomplete, limiting our ability to create a holistic view of the biological system and understand the biological phenomena. Experimental identification of such interactions is both time-consuming and expensive. With the recent advancements in high-throughput data generation and significant improvement in computational power, various computational methods have been developed to predict novel interactions in the noisy network. Recently, deep learning methods such as graph neural networks have shown their effectiveness in modeling graph-structured data and achieved good performance in biomedical interaction prediction. However, graph neural networks-based methods require human expertise and experimentation to design the appropriate complexity of the model and significantly impact the performance of the model. Furthermore, deep graph neural networks face overfitting problems and tend to be poorly calibrated with high confidence on incorrect predictions. To address these challenges, we propose Bayesian model selection for graph convolutional networks to jointly infer the most plausible number of graph convolution layers (depth) warranted by data and perform dropout regularization simultaneously. Experiments on four interaction datasets show that our proposed method achieves accurate and calibrated predictions. Our proposed method enables the graph convolutional networks to dynamically adapt their depths to accommodate an increasing number of interactions.
translated by 谷歌翻译
我们研究了从高阶图卷积中的有效学习,并直接从邻接矩阵进行节点分类学习。我们重新访问缩放的图形残留网络,并从残留层中删除Relu激活,并在每个残留层上应用一个重量矩阵。我们表明,所得模型导致新的图卷积模型作为归一化邻接矩阵,残留权重矩阵和残差缩放参数的多项式。此外,我们提出了直接绘制多项式卷积模型和直接从邻接矩阵学习的自适应学习。此外,我们提出了完全自适应模型,以学习每个残留层的缩放参数。我们表明,所提出的方法的概括界限是特征值谱,缩放参数和残留权重的上限的多项式。通过理论分析,我们认为所提出的模型可以通过限制卷积的更高端口和直接从邻接矩阵学习来获得改进的概括界限。我们使用一套真实数据,我们证明所提出的方法获得了提高的非全粒图淋巴结分类的精度。
translated by 谷歌翻译
事实证明,神经网络是以非常低的比特率解决语音编码问题的强大工具。但是,可以在现实世界中可以强大操作的神经编码器的设计仍然是一个重大挑战。因此,我们提出了神经末端2端语音编解码器(NESC),可用于3 kbps的高质量宽带语音编码的稳定,可扩展的端到端神经语音编解码器。编码器使用一种新的体系结构配置,该配置依赖于我们提出的双PATHCONVRNN(DPCRNN)层,而解码器体系结构基于我们以前的工作streamwise-stylemelgan。我们对干净和嘈杂的语音的主观听力测试表明,NESC对于看不见的条件和信号扰动特别强大。
translated by 谷歌翻译
本文介绍了电力系统运营商的域知识如何集成到强化学习(RL)框架中,以有效学习控制电网拓扑以防止热级联的代理。由于大搜索/优化空间,典型的基于RL的拓扑控制器无法表现良好。在这里,我们提出了一个基于演员 - 评论家的代理,以解决问题的组合性质,并使用由RTE,法国TSO开发的RL环境训练代理。为了解决大型优化空间的挑战,通过使用网络物理修改环境以增强代理学习来纳入训练过程中的基于奖励调整的基于课程的方法。此外,采用多种方案的并行训练方法来避免将代理偏置到几种情况,并使其稳健地对网格操作中的自然变异性。如果没有对培训过程进行这些修改,则RL代理失败了大多数测试场景,说明了正确整合物理系统的域知识以获得真实世界的RL学习的重要性。该代理通过RTE测试2019年学习,以运行电力网络挑战,并以精确度和第1位的速度授予第2位。开发的代码是公共使用开放的。
translated by 谷歌翻译
我们研究了深GCN模型中的自适应层图形卷积。我们建议ADAGPR在GCNII网络的每一层中学习通用的Pageranks,以诱导适应性卷积。我们表明,ADAGPR结合的概括是由归一化邻接矩阵的特征值谱的多项式按概括性Pagerank系数数量的顺序界定的。通过分析概括范围,我们表明过度厚度取决于汇总的较高阶段矩阵矩阵和模型深度。我们使用基准真实数据对节点分类进行了评估,并表明ADAGPR与现有的图形卷积网络相比提供了改进的精确度,同时证明了针对超平面的稳健性。此外,我们证明了对层概括的PageRanks系数的分析使我们能够在每个层上定性地了解模型解释的卷积。
translated by 谷歌翻译